Genome-Wide Association Analysis of Salt-Tolerant Traits in Terrestrial Cotton at Seedling Stage

Soil salinization is the main abiotic stress factor affecting agricultural production worldwide, and salt stress has a significant impact on plant growth and development. Cotton is one of the most salt-tolerant crops. Therefore, the selection and utilization of salt-tolerant germplasm resources and the excavation of salt resistance genes play important roles in improving cotton production in saline–alkali soils. In this study, we analysed the population structure and genetic diversity of a total 149 cotton plant materials including 137 elite Gossypium hirsutum cultivar accessions collected from China and 12 elite Gossypium hirsutum cultivar accessions collected from around the world. Illumina Cotton SNP 70 K was used to obtain genome-wide single-nucleotide polymorphism (SNP) data for 149 elite Gossypium hirsutum cultivar accessions, and 18,430 highly consistent SNP loci were obtained by filtering. It was assessed by using PCA principal component analysis so that the 149 elite Gossypium hirsutum cultivar accessions could be divided into two subgroups, including subgroup 1 with 78 materials and subgroup 2 with 71 materials. Using the obtained SNP and other marker genotype test results, under salt stress, the salt tolerance traits 3d Germination potential, 3d Radicle length drop rate, 7d Germination rate, 7d Radicle length drop rate, 7d Germination weight, 3d Radicle length, 7d Radicle length, Relative Germination potential, Relative Germination rate, 7d Radicle weight drop rate, Salt tolerance index 3d Germination potential index, 3d Radicle length index, 7d Radicle length index, 7d Radicle weight index and 7d Germination rate index were evaluated by GWAS (genome-wide association analysis). A total of 27 SNP markers closely related to the salt tolerance traits and 15 SNP markers closely related to the salt tolerance index were detected. At the SNP locus associated with phenotyping, Gh_D01G0943, Gh_D01G0945, Gh_A01G0906, Gh_A01G0908, Gh_D08G1308 and Gh_D08G1309 related to plant salt tolerance were detected, and they were found to be involved in intracellular transport, sucrose synthesis, osmotic pressure balance, transmembrane transport, N-glycosylation, auxin response and cell amplification. This study provides a theoretical basis for the selection and breeding of salt-tolerant upland cotton varieties.


Introduction
Soil salinization is one of the main abiotic stress factors affecting agricultural production worldwide, and salt stress has significant impacts on plant growth and development. Under salt treatment, seed germination, root length, plant height and fruit development are significantly inhibited [1]. Salt stress can decrease cotton yield by up to 50-67% [2]. The availability of salt-tolerant varieties would expand the area of cotton production by

Group Structure
It is important for GWAS analysis to control the effect of population structure because population stratification could eliminate spurious associations between genotypes and phenotypes [17,18]. In total, 18,430 highly consistent SNP sites were obtained. PCA found that 149 individuals of upland cotton could be divided into two subgroups, including subgroup 1 (marked in red) consisting of 78 materials, and subgroup 2 consisting of 71 individuals, with PC1 9.25% and PC2 5.2%, respectively. Based on the analysis of the phylogenetic tree constructed from the SNP data, the 149 cotton plant accessions could be divided into two subgroups. The results are quite consistent with the grouping structure of PCA ( Figure 1B). The hybrid model was used in ADMIXTURE V250 software. First, the number of subgroups (K) was set to 1-20, and each K value was set to three repetitions. Assuming that each site was independent, the Markov chain Monte Carlo (MCMC), at the beginning of the non-count iteration (length of burn-in period), was set to 10,000 times, and then the MCMC, after no-count iterations, was set to 1,000,000 times. The optimal K value was selected according to the principle of maximum likelihood value to determine the number of subgroups and the group structure. The cross-validation error (CV error) was calculated under different K values (2)(3)(4)(5)(6)(7)(8)(9)(10)(11)(12)(13)(14)(15)(16)(17)(18)(19)(20). In this experiment, using the Q value calculation and Structure software, when K was equal to 18, the CV error value was the smallest ( Figure 1C). Taking the corresponding Q-matrix at k = 18 as the covariate could reasonably eliminate spurious association effects and improve the GWAS accuracy. The cotton population showed a certain distribution gap, but most varieties were clustered together, which corresponded to the actuality that the early introduction of Chinese land cotton varieties were mainly from the United States and the Soviet Union, and breeding used the introduced materials as the parents. The results showed that the division of the subgroups was significantly correlated with the material source, indicating that the genetic background of the resources was relatively homogeneous. For subgroups, the results of population structure analysis were in line with the evolutionary trends of the genetic background during breeding.

Material Heterozygosity
The individual heterozygosity analysis found that 95% of the cotton accessions were less than 30% heterozygous, and 80% of the individual materials were less than 5% heterozygous ( Figure 2). Possible misalignment caused by homologous exchange (HEs) was prevalent in heterologous tetraploid crops, but, in cotton, HEs remained at a low level according to previous studies and had little effect on the results [19]. Figure 3 shows that the 149 cotton varieties can be divided into two subgroups. Among the 149 varieties, the genetic relationships between most varieties were weak (the yellow parts in the figure), and the genetic relationships between a few materials were very close (the dark red parts in the picture).

Analysis of Linkage Disequilibrium
The LD distance decreases as the physical position of the SNP on the chromosome increases. The analysis found that the LD distance of 149 samples was 432 kb (R square = 0.5) (Figure 4). Slightly higher than the previous study [20] where Chinese upland cotton material was 296 kb. This result further shows that the genetic diversity of the selected material is reduced. The genetic differentiation history of cotton collected in China is relatively low. This is related to the tended homogenization of Chinese cotton breeding.

Analysis of Linkage Disequilibrium
The LD distance decreases as the physical position of the SNP on the chromosome increases. The analysis found that the LD distance of 149 samples was 432 kb (R square = 0.5) ( Figure 4). Slightly higher than the previous study [20] where Chinese upland cotton material was 296 kb. This result further shows that the genetic diversity of the selected material is reduced. The genetic differentiation history of cotton collected in China is relatively low. This is related to the tended homogenization of Chinese cotton breeding.

Analysis of Linkage Disequilibrium
The LD distance decreases as the physical position of the SNP on the chromosome increases. The analysis found that the LD distance of 149 samples was 432 kb (R square = 0.5) (Figure 4). Slightly higher than the previous study [20] where Chinese upland cotton material was 296 kb. This result further shows that the genetic diversity of the selected material is reduced. The genetic differentiation history of cotton collected in China is relatively low. This is related to the tended homogenization of Chinese cotton breeding.

Phenotypic Statistical Analysis
In the best linear unbiased prediction (BLUP) for salt tolerance index traits, a total  of five salt tolerance index phenotypic traits, 3d Germination potential index, 3d Radicle  length index, 7d Radicle length index, 7d Radicle weight index, 7d Germination rate index, were identified. Figure 5 shows that the phenotypic distributions of the five traits were all normally distributed (p-value > 0.05), indicating that these traits are all typical quantitative traits and are controlled by minor-effect polygenes. Using R language to calculate the Pearson correlation coefficients between traits, it was found that the correlations between different traits were low (Table 1).

Phenotypic Statistical Analysis
In the best linear unbiased prediction (BLUP) for salt tolerance index traits, a total of five salt tolerance index phenotypic traits, 3d Germination potential index, 3d Radicle length index, 7d Radicle length index, 7d Radicle weight index, 7d Germination rate index, were identified. Figure 5 shows that the phenotypic distributions of the five traits were all normally distributed (p-value > 0.05), indicating that these traits are all typical quantitative traits and are controlled by minor-effect polygenes. Using R language to calculate the Pearson correlation coefficients between traits, it was found that the correlations between different traits were low (Table 1).

Association Analysis of Salt Tolerance Traits
Based on the identification results of the morphological, physiological, biochemical and yield traits of the specific germplasms of upland cotton under saline-alkali stress, an association analysis of the salt tolerance traits was carried out, and the favorable alleles related to the salt tolerance of the specific germplasm of upland cotton were located. A total of 27 SNP sites related to salt tolerance traits were detected ( Table 2): 3 SNP sites related to 7d Radicle length; 10 SNP sites related to 7d Radicle length drop rate; 3 SNP sites related to 7d Germination rate; 3 SNP sites related to 7d Germination weight; 8 SNP sites related to Relative germination rate. The marker loci were scattered on six cotton chromosomes, A01, D01, D05, D08, D11 and D13, without clustering, and six QTLs related to salt tolerance traits were located on different chromosomes ( Table 2). The research and development of SNP markers and QTL sites closely related to salt tolerance traits can be applied to the molecular marker-assisted selection of cotton salt tolerance.

Association Analysis of Salt Tolerance Index Traits among Cotton Accessions
The results of the GWAS under the optimal model of the salt tolerance index traits using BLUP were counted and explained, and the results are shown in Table 2. A total of 15 significant SNP-trait associations were detected (−log10(p) > 3) (Table 3). It was also found that among these 15 traits, only 4 traits had significant SNPs, while the 7d Radicle weight index did not have a significant locus. This may be because this trait is more complicated and controlled by multiple minor QTLs ( Figure 6). According to the Bonferroni correction principle, −log10(p) > 3.97 (p = 1/n, n is the SNP numbers in this study) should be the threshold, but the Bonferroni correction is too stringent and no significant SNPs for two traits could be identified with this threshold. To obtain more associated SNPs, the significantly associated SNP markers with salt-tolerant-related traits were identified according to −log10(p) > 3.0 [21,22].

Candidate Gene Screening
To investigate the expression pattern of these genes during the seedling stage under salt stress tolerance, and to further screen the possible candidate genes involved in the salt response, these genes were analyzed using the expression level of the seedlings at 1, 3, 6 and 12 h under 400 mM salt concentration. The public transcriptome data sets were retrieved from ccNET (https://structralbiology.cau.edu.cn.gossypium (accessed on 22 February 2020)) [23]. As a result, five notable SNP sites on chromosome A01 and two notable SNP sites on chromosome D01 associated with the 7d Radicle length drop rate were detected, three genes, Gh_D01G0943, Gh_D01G0944 and Gh_D01G0945, were found at 500 kb upstream and 500 kb downstream of the two relevant SNP sites in D01; ten genes, Gh_A01G0905, Gh_A01G0906, Gh_A01G0907, Gh_A01G0908, Gh_A01G0909, Gh_A01G0910, Gh_A01G0911, Gh_A01G0912, Gh_A01G0913 and Gh_A01G0914, were found at 500 kb upstream and 500 kb downstream of the five relevant SNP sites in A01. At the same time, six identical associated SNP loci were detected in the Relative germination rate and 7d Germination rate index, all of which were on the D08 chromosome. Ten genes were detected in the upstream and downstream 500 kb of the associated SNP loci: Gh_D08G1305, Gh_D08G1306, Gh_D08G1307, Gh_D08G1308, Gh_D08G1309, Gh_D08G1310, Gh_D08G1311, Gh_D08G1312, Gh_D08G1313 and Gh_D08G1314 (Figure 7). These genes were aligned to the transcriptome data, selected for differentially expressed genes, and, finally, we identified 6 candidate genes associated with plant salt stress by gene func-tional annotation with previous findings ( Table 4). The transcriptome results showed that Gh_D01G0945, Gh_A01G0906 and Gh_D08G1308 expression was down-regulated of salt stress; Gh_A01G0908 and Gh_D08G1309 expression was up-regulated of salt stress. Gh_D01G0943 expression reached their peak at 3 h of salt stress (Figure 8).
tion rate and 7d Germination rate index, all of which were on the D08 chromosome. Ten genes were detected in the upstream and downstream 500 kb of the associated SNP loci: Gh_D08G1305, Gh_D08G1306, Gh_D08G1307, Gh_D08G1308, Gh_D08G1309, Gh_D08G1310, Gh_D08G1311, Gh_D08G1312, Gh_D08G1313 and Gh_D08G1314 (Figure 7). These genes were aligned to the transcriptome data, selected for differentially expressed genes, and, finally, we identified 6 candidate genes associated with plant salt stress by gene functional annotation with previous findings ( Table 4). The transcriptome results showed that Gh_D01G0945, Gh_A01G0906 and Gh_D08G1308 expression was down-regulated of salt stress; Gh_A01G0908 and Gh_D08G1309 expression was upregulated of salt stress. Gh_D01G0943 expression reached their peak at 3 h of salt stress ( Figure 8).

Alignment of Salt Resistance-Related Genes and Arabidopsis Homologous Sequences
The comparison of Gh_D01G0943 with Tair3 showed that the homologous gene in Arabidopsis thaliana is AT1G75680, with 76% homology, and the gene is an GH9B7 glycosyl hydrolase 9B7. The comparison of Gh_D01G0945 with Tair3 showed that the homologous gene in Arabidopsis thaliana is AT1G77210, with 75% homology, which encodes STP14 sugar transporter 14. The comparison of Gh_A01G0906 with Tair3 showed that the homologous gene in Arabidopsis thaliana is AT2G21220, with 78% homology, and the gene is an SAUR-like auxin-responsive protein family. The comparison of Gh_A01G0908 with Tair3 showed that the homologous gene in Arabidopsis thaliana is AT1G19850, with 83% homology, and the gene is an MP Transcriptional factor B3 family protein/auxin-responsive factor AUX/IAAlike protein. The comparison of Gh_D08G1308 with Tair3 showed that the homologous gene in Arabidopsis thaliana is AT5G18010, with 83% homology, and the gene is an SAUR19 SAUR-like auxin-responsive protein family. The comparison of Gh_D08G1309 with Tair3 showed that the homologous gene in Arabidopsis thaliana is AT4G02280, with 75% homology, and the gene is an SUS3 sucrose synthase 3 ( Figure 9, Table 5). Both Gh_A01G0906 and Gh_D08G1308 Arabidopsis thaliana orthologs belong to SAUR-like auxin-responsive protein family, so both genes may have similar gene functions, the transcriptomic results showed similar expression patterns of both genes. AT4G02280, with 75% homology, and the gene is an SUS3 sucrose synthase 3 ( Figure 9, Table 5). Both Gh_A01G0906 and Gh_D08G1308 Arabidopsis thaliana orthologs belong to SAUR-like auxin-responsive protein family, so both genes may have similar gene functions, the transcriptomic results showed similar expression patterns of both genes.

Target Gene Identification Based on GWAS
With the efficient development of genotyping technology, SNP markers have the advantages of wide distribution, high throughput, low cost and high accuracy. Genomewide association analysis based on SNP genetic markers has become the first choice for analyzing the complex traits of humans, animals and plants [24]. Some significant SNP Figure 9. Gene sequence blast results. Table 5. Genes associated with salt stress.

Target Gene Identification Based on GWAS
With the efficient development of genotyping technology, SNP markers have the advantages of wide distribution, high throughput, low cost and high accuracy. Genomewide association analysis based on SNP genetic markers has become the first choice for analyzing the complex traits of humans, animals and plants [24]. Some significant SNP markers in linkage disequilibrium can show a higher degree of linkage disequilibrium than those SNPs that actually cause phenotypic variation [25,26].  identified 31 SSRs and 8 SNPs associated with salt tolerance based on relative seed germination rate under seven environments, using 503 upland cotton accessions and 179 SSRs and 11,975 array-derived SNPs [27]. Du et al. (2016) performed association analysis of 304 upland cotton cultivars and identified 95 significant associations for 10 salt tolerance-related traits at the germination and seedling stages [28]. Jia et al. (2014) identified three simple sequence repeat (SSR) markers significantly associated with the relative survival rate under salt stress through association mapping methods using 323 G. hirsutum germplasms. Two haplotypes related to fibre length and fibre strength were identified on chromosomes At07 and Dt11 [29]. Reddy et al. (2017) used GBS (genotypingby-sequencing) SNP typing technology to develop 10,129 polymorphic SNP markers from upland cotton and sea island cotton based on SNP markers and linkage disequilibrium LD from upland cotton and sea island cotton. A total of 142 and 282 blocks were excavated from sea island cotton [30].
In this study, Illumina Cotton SNP 70K was used to develop 18,430 SNP markers in the whole genome. On this basis, whole-genome association analysis was used to associate excellent sites related to salt tolerance traits and the salt tolerance index. The results of a total of 27 SNP sites related to salt tolerance traits were detected. The GWAS under the optimal model of the salt tolerance index traits using BLUP were counted and explained, a total of 15 significant SNP-trait associations were detected.

Functional Analysis of Candidate Genes
Candidate genes are a class of genes whose expression on the chromosome is not clear. They are involved in the phenotypic expression of organisms, and association analysis suggests that they are related to a certain part of the genome. Such genes may be structural genes, regulatory genes or affect the expression of traits in biochemical metabolic pathways. The functional insufficiency of the candidate gene is known, and whether it is related to salt resistance has been verified. According to the screening, functional annotations can be assigned, or Arabidopsis homologous genes can be found from the gene information. This method has been previously reported to target genes that are clearly related to salt tolerance. GWAS analysis is a fast and powerful method to mine regulatory genes through crop indicators. A number of genes conferring salt tolerance such as MKK [31], ZFP [32], NAC [33], ERF [34], DREB [32], GhMT3a [35], MPK [36] and tonoplast Na+/H+ antiporter [37] have been identified in cotton.  reported that, among a total of 223 genes within a salt tolerance QTL interval on D01 (37,771-1,942,912), four candidate genes (GhPIP3A, GhSAG29, GhTZF4 and GhTZF4a) showed a differential expression between sensitive and tolerant accessions under salt stress [10]. In this study, the comparison between our GWAS under the optimal model of the salt tolerance index traits for BLUP and previous reports showed that most of the 15 associations were novel reported loci. Abdelraheem et al. (2017Abdelraheem et al. ( , 2018 reported that each chromosome in each of the six pairs of homologous chromosomes (i.e., A01/D01, A03/D03, A08/D08, A09/D09, A12/D12 and A13/D13) had at least one ST (salt stress) QTL > 470,000 SNPs surveyed [38,39]. A total of 16 and 27 QTLs were identified for dry shoot weight and plant height, respectively, under both water-limited and saline environments, while 11 QTLs were found commonly linked with tolerance to drought and saline environments. Five striking SNP sites on chromosome A01, two striking SNP sites on chromosome D01 associated and six striking SNP sites on chromosome D08 were detected. The Gh_D01G0943, Gh_D01G0945, Gh_A01G0906, Gh_A01G0908, Gh_D08G1308 and Gh_D08G1309 gene associated with plant salt stress was detected at the associated SNP locus.
Sucrose synthase 3 (SUS3) catalyzes the reversible conversion of sucrose and a nucleoside diphosphate into the corresponding nucleoside diphosphate-glucose and fructose; a member of the glycosyltransferase family of enzymes, sucrose synthase 3 is ubiquitous in the plant kingdom and catalyzes in vivo and in vitro the synthesis and cleavage of sucrose. Sucrose synthase is one of the key enzymes in the plant carbohydrate metabolism and regulates the assignment of sucrose, a kind of product of photosynthesis, into a variety of plant metabolic processes. Sucrose synthase also plays pivotal roles during plant growth and development [40,41]; SUS is important for metabolite homeostasis and the timing of seed development and is a key enzyme of carbon metabolism in the heterotrophic tissues of plants [42,43]. SUS can transport sucrose into a variety of pathways, the most important of which provides a precursor substance (UDP-glucose) for the biosynthesis of cell wall polymers and starch [44]. In addition, SUS plays an important role in the process of growth, development and the metabolism of sink organs, and also plays an important role in the adaptation of plants to abiotic stress, such as hypoxia and cold; under hypoxia, the activity of SUS increased. SUS3 is induced in various organs under dehydration conditions including leaves deprived of water or submitted to osmotic stress as well as late-maturing seeds [45].
Sugar transporter 14 (STP14), as a transport protein, is a galactose transporter expressed in both source and sink tissues with the highest levels in the endosperm; it affects galactose transmembrane transporter activity, carbohydrate transmembrane transporter activity and sugar-hydrogen symporter activity [46]. STP are proton-coupled symporters responsible for the uptake of glucose from the apoplast into plant cells. They are integral to organ development in symplastically isolated tissues such as seed, pollen and fruit. Additionally, STPs play a vital role in plant responses to stressors such as dehydration and prevalent fungal infections such as rust and mildew [47]. STPs play a role in senescence and programmed cell death, and participate in the recycling of sugars derived from cell wall degradation [46,48,49].
Glycosyl hydrolase 9B7 (GH9B7) is a kind of enzyme that hydrolyzes glycosidic bonds and plays an important role in the hydrolysis and synthesis of biological sugar and sugar conjugates. When the enzyme catalyzes the glycosidic reaction, if the oxygen atom of water molecule attacks the anomeric carbon on the receptor glucose, it will be hydrolyzed, but if the oxygen atom on the hydroxyl group of glucose attacks the anomeric carbon on the receptor glucose, it will be transglycosylated [50]. GH9B7 belong to glucosidase and their main function is to hydrolyze the glucoside bond, releasing glucose as a product. They are an indispensable class of enzymes in the glucose metabolism pathway of living organisms, involved in the carbohydrate metabolic process [51]. The process of N-glycosylation involves the participation of various enzymes, mainly glycoacyltransferases, including the transfer of active donors (usually NDP-sugar) to molecules of recipient substances such as sugars, proteins and lipids; the latter catalytic activity is to trim various glycochains, which together complete N-glycosylation. According to the current research results, it can be determined that the N-glycosylation modification of the protein plays an important role in the processes of protein folding and transportation [52].
Transcriptional factor B3 family protein/auxin-responsive factor AUX/IAA-like protein (MP), is an auxin-responsive transcription factor that is required for primary root formation and vascular development [53]. It plays a critical role in Arabidopsis embryonic root initiation, MP transcriptionally initiates the ground tissue lineage and acts upstream of the regulatory network that controls ground tissue patterning and maintenance [54]. In the shoot, cell polarity patterns follow MP expression, which in turn follows auxin distribution patterns [55]. Signaling through MP/AUXIN RESPONSE FACTOR 5 is necessary for the formation of shoots from Arabidopsis calli [56]. Aux/IAA auxin perception mediates rapid cell wall acidification and growth of Arabidopsis hypocotyls [57]. AUX/IAA is a transcriptional repressor that has proved to play a very vital role in the auxin signaling pathway [58].
The plant hormone auxin controls numerous aspects of plant growth and development by regulating the expression of hundreds of genes. SMALL AUXIN UP RNA (SAUR) genes comprise the largest family of auxin-responsive genes; the SAUR19-24 subfamily of auxininduced SAUR genes promotes cell expansion [59]. SAUR proteins provide a mechanistic link between auxin and plasma membrane H+-ATPases (PM H+-ATPases) in Arabidopsis thaliana. Plants overexpressing stabilized SAUR19 fusion proteins exhibit increased PM H+-ATPase activity, and the increased growth phenotypes conferred by SAUR19 overexpression are dependent upon normal PM H+-ATPase function. SAUR19 stimulates PM H+-ATPase activity by promoting phosphorylation of the C-terminal autoinhibitory domain. SAUR19, as well as additional SAUR proteins, interacts with the PP2C-D subfamily of type 2C protein phosphatases. These phosphatases are inhibited upon SAUR binding, act antagonistically to SAURs in vivo, can physically interact with PM H+-ATPases and negatively regulate PM H+-ATPase activity [60]. SAURs play a central role in auxin-induced plant growth, but can also act independently of auxin, on tissue specifically regulated by various other hormone pathways and transcription factors [61].
Auxin functions, at least in part, by regulating a set of early auxin response genes: Aux/IAAs and SAURs [62]. Auxin is perceived by receptors including TRANSPORT IN-HIBITOR RESPONSE 1 (TIR1) and the closely related AUXIN SIGNALLING F-BOX (AFB) F-box proteins, which recruit Aux/IAA repressors to the SCFTIR1/AFB complex for ubiquitination and proteasome-mediated degradation, releasing the inhibition of AUXIN RE-SPONSE FACTORS (ARFs), and eventually activating auxin-induced gene expression [63]. The root growth inhibited by salt stress was related to the decrease in auxin accumulation. The position of auxin transporter AUX1 is changed due to salt stress, so auxin transport may be related to decreased accumulation of auxin in the roots [64]. The accumulation of carbohydrates such as sucrose plays an important role in alleviating stress damage, including osmotic protection, carbon source storage and ROS removal. ASISH et al. (2004) showed that the intracellular reducing sugar (sucrose and fructan) levels of different species of plants are increased under salt stress [65]. In the process of salt stress response, the mechanisms or strategies that control the metabolism, transportation and balance of molecules, hormone metabolism, antioxidant metabolism and signal transduction mechanisms play a vital role in the process of plant adaptation to salt environment.

Conclusions
A total of 18,430 polymorphic SNP markers were developed and screened from natural populations using gene chip technology. These SNP markers were used to analyse the structure of the population to obtain the Q-matrix, and then the salt tolerance traits and salt tolerance index data were combined to conduct a genome-wide association analysis. The natural population can be divided into two subgroups. The genetic relationship between the cotton cultivars was weak; indicating that the breed inherited diversity is decreasing. The salt tolerance traits were associated with 27 significant SNP sites, and the salt tolerance index was associated with 15 significant SNP sites. The significant SNP sites were further analysed, salt tolerance-related Gh_D01G0943, Gh_D01G0945, Gh_A01G0906, Gh_A01G0908, Gh_D08G1308 and Gh_D08G1309 were detected in the spot data. The homologous sequences were compared with Arabidopsis thaliana to obtain the homologous genes AT1G75680, AT1G77210, AT2G21220, AT1G19850, AT5G18010 and AT4G02280. Analysis of the functions of these six genes revealed that the Arabidopsis thaliana homologous sequence encodes the glycosyl hydrolase 9B7, sugar transporter 14, sucrose synthase 3, SAUR-like auxin-responsive protein family and Transcriptional factor B3 family protein/auxin-responsive factor AUX/IAA-like protein. The sucrose-generating metabolic system, transmembrane transport system and regulation of the auxin response have high activity in the salt tolerance reaction of cotton, so the stability of the structure and function of the protective membrane and macromolecular matter are generated to maintain the cellular osmotic pressure balance and are the key to the salt tolerance of cotton. This study further analysed the functions and expression patterns of cotton salt-tolerant genes and even has certain reference value for analyzing the mechanism of cotton salt tolerance.

Test Materials
We sampled 149 modern G. hirsutum cultivars collected from the Chinese national medium-term cotton gene bank at the Institute of Cotton Research (ICR) of the Chinese Academy of Agricultural Sciences (CAAS) ( Table 6).    : 0.1). The clean reads were anchored to the cotton reference genome using Burrows-Wheeler Aligner (BWA). The SAM tools software was used to convert alignment files to BAM files. After 63,058 probe sequences were blast aligned with the genome, the optimal result screened out was the position of the SNP on the reference genome. SnpEff 4.0 [70] software was used to obtain the locations of the variable sites (intergenic zones, gene zones or CDS zones) in the reference genome and the effects of the variations (synonymous mutations, nonsynonymous mutations, etc.).

Population Structure and Kinship Analysis
ADMIXTURE V250 [71] software was used to analyze the group structure of the research materials. For the research group, the number of subgroups (K value) was preset to 1-20 for clustering, the clustering results were cross-validated, and the optimal number of clusters was determined according to the lowest cross-validation error rate. SPAGeDi 1.3 [72] software was used to estimate the relative kinship between two individuals in a natural population. The kinship itself is the relative value that defines the genetic similarity between two specific materials and the genetic similarity between any materials. Therefore, when the kinship value between the two materials is less than 0, it is directly defined as 0. Five independent runs were performed; the number of populations (K) was set from 1 to 20; the burn-in time and Markov chain Monte Carlo replication numbers was set to 10,000. The optimal K value was determined by comparing the LnP (D) and ∆k based on the rate of change in LnP (D) [73]. A Q-matrix produced by STRUCTURE listed the estimated membership coefficients in a cluster for the subsequent association analysis.

Linkage Disequilibrium Analysis
On the same chromosome, the linkage disequilibrium between two SNPs within a certain distance can be calculated (such as 1000 kb), and the linkage disequilibrium strength is represented by r 2 . The closer r 2 is to 1, the stronger the strength of linkage disequilibrium. The SNP spacing is fit to r 2 , and a graph can be drawn to represent the variation of r 2 with distance. Generally, the closer the SNP spacing is, the larger r 2 is, and the farther the SNP spacing is, the smaller r 2 is. The distance travelled when the maximum r 2 value drops to half is used as the LD decay distance (LDD) of linkage disequilibrium. The longer the LDD is, the smaller the probability of recombination within the same physical distance; the shorter the LDD is, the greater the probability of recombination within the same physical distance. Plink2 [74] software was used for LD analysis.

Association Analysis of Salt Tolerance Traits
The TASSEL5.0 (http://www.maizegenetics.net/tassel (accessed on 26 February 2020)) software package, EMMAX (http://genetics.cs.ucla.edu/emmax/ (accessed on 26 February 2020)) software package and FaST-LMM0.2.19 (https://www.microsoft.com/en-us/ download/confirmation.aspx?id=52588 (accessed on 22 February 2020)) software package were employed to construct association tests of salt tolerance-related traits. Through a certain amount of population SNP marker data, combined with population structure and target trait phenotype data, the target region or site associated with the target trait can be located.

Salt Stress Conditions and Salt-Tolerant Trait Collection
The salt tolerance test during the germination period used double-layer filter paper rolls to stand the plant upright. Two pieces of filter paper each 20 cm in length and width were cut, and one piece of filter paper was spread on the test bench with a sprayer containing NaCl solution. The filter paper was soaked, and 15 seeds were placed 2 cm down from the top of the filter paper. The filter paper was then placed vertically into the culture box. Approximately 30 rolled filter papers were placed in each culture box. The culture box was then placed at 28 • C, and the photoperiod was 10 h/14 h (L/D), with heat preservation and culture in a constant temperature light incubator. The germination potential of seeds and the length of each seed were measured on the 3rd day, and the germination rate, Radicle length and stem fresh weight of the seeds were measured on the 7th day. This process was repeated 3 times. The treatment concentrations of NaCl solution were 0 NaCl (CK) and 150.0 mmol/L NaCl (  The calculation formula analyzes the relative values of the salt stress environment and the control conversion. The germination standard is that the radicle is half the length of the seed. Germination weight is the weight of all biological materials after germination.
Relative germination potential % = germination potential of treated seeds/germination potential of control seeds × 100%.
Relative germination rate % = germination rate of treated seeds/germination rate of control seeds × 100%.
Decrease rate % = (treatment traits − control traits)/control traits × 100%. Salt tolerance index: Note: X d and X w are the measured values of a certain index of each material under salt stress conditions and contrast conditions, respectively, and X is the average value of this index under salt stress conditions. Germination potential: GP = M 1 M Note: M 1 : Number of normal germinating grains within days of germination potential; M: Number of seeds to be tested.
Statistical analysis of the phenotype of salt tolerance-related traits was performed by SPSS. SAS software was used to perform the best linear unbiased prediction (BLUP) for salt tolerance traits; the parameter is the default value. Software was used to perform correlation analysis for each trait based on the model of mlm, glm, cmlm, emmax and fastlmm, and the result of the structure was used as a fixed effect. Due to the small number of environments and the existence of certain false positives, the CMLM model can reduce the false positives as much as possible, so the method of CMLM is adopted. The CMLMs were performed by simultaneously accounting for multiple levels of Q-matrix and K-matrix according to the methods described [79]. Among them, the mixed linear model formula of TASSEL software is as follows: y = Xα + Qβ + Kµ + e Note: SPAGeDi 1.3 [72] software was used to calculate the genetic relationship K between samples. The general linear model uses Q population structure information, while the mixed linear model uses Q + K, which is the population structure and genetic relationship information. X is the genotype and Y is the phenotype. In the end, an association result can be obtained for each SNP site.

Prediction and Functional Annotation of Salt-Tolerant Candidate Genes
The independent significant SNP sites selected by the GWAS analysis results and LD calculations, plus or minus 500 kb upstream and downstream of the physical location of each SNP site as the candidate gene physical location query area, were identified by mapping the gene or Arabidopsis homologous gene and annotating information to narrow down the target candidate genes. NCBI, COTTONGEN, CNKI, Tair3 and other websites were used to annotate gene functions and compare homologous sequences.
Author Contributions: Z.Z. was the executor of the experimental design and experimental research of this study; Z.Z. and Z.S. completed the data analysis and the writing of the paper; J.W., Y.L., Y.X. and Z.G. participated in the experimental design, experimental data collection and test results analysis; X.L. and J.Z. were the architects and directors of the project, guiding experimental design, data analysis, paper writing and modification. All authors have read and agreed to the published version of the manuscript.

Institutional Review Board Statement:
The study did not involve humans or animals.

Informed Consent Statement:
This study did not involve human studies. Data Availability Statement: Data sharing is not applicable to this article as no new data were created or analyzed in this study. The file with the vcf with SNPs and the file with phenotyping have been uploaded as attachments.